A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
نویسندگان
چکیده
In this paper we describe our work in progress in the automatic development of a taxonomy of Spanish nouns, we offer the Perl implementation we have so far, and we discuss the different problems that still need to be addressed. We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We evaluate the quality of the taxonomy both manually and also using Spanish Wordnet as a gold-standard. We estimate an average of 89.07% precision and 25.49% recall considering only the results which the algorithm presents with high degree of certainty, or 77.86% precision and 33.72% recall considering all results.
منابع مشابه
SIMULATED ANNEALING ALGORITHM FOR SELECTING SUBOPTIMAL CYCLE BASIS OF A GRAPH
The cycle basis of a graph arises in a wide range of engineering problems and has a variety of applications. Minimal and optimal cycle bases reduce the time and memory required for most of such applications. One of the important applications of cycle basis in civil engineering is its use in the force method to frame analysis to generate sparse flexibility matrices, which is needed for optimal a...
متن کاملA mesh generation procedure to simulate bimaterials
It is difficult to develop an algorithm which is able to generate the appropriate mesh around the interfaces in bimaterials. In this study, a corresponding algorithm is proposed for this class of unified structures made from different materials with arbitrary shapes. The non-uniform mesh is generated adaptively based on advancing front technique available in Abaqus software. Implementing severa...
متن کاملA New Implementation of Maximum Power Point Tracking Based on Fuzzy Logic Algorithm for Solar Photovoltaic System
In this paper, we present a modeling and implementation of new control schemes for an isolated photovoltaic (PV) using a fuzzy logic controller (FLC). The PV system is connected to a load through a DC-DC boost converter. The FLC controller provides the appropriate duty cycle (D) to the DC-DC converter for the PV system to generate maximum power. Using FLC controller block in MATLABTM/Simulink e...
متن کاملSpace Vector Modulation Based on Classification Method in Three-Phase Multi-Level Voltage Source Inverters
Pulse Width Modulation (PWM) techniques are commonly used to control the output voltage and current of DC to AC converters. Space Vector Modulation (SVM), of all PWM methods, has attracted attention because of its simplicity and desired properties in digital control of Three-Phase inverters. The main drawback of this PWM technique is its complex and time-consuming computations in real-time im...
متن کاملOntology Population using Corpus Statistics
This paper presents a combination of algorithms for automatic ontology building based mainly on lexical cooccurrence statistics. We populate an ontology with hypernymy links, thus we refer more specifically to a taxonomy of lexical units (nouns organized by hypernymy relations) rather than an ontology of formally defined concepts. A set of combined statistical procedures produce fragments of ta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016